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ABSTRACT 

We have discovered a population of extremely red galaxies at z ~ 1.5 which have 
apparent stellar ages of > 3 Gyr, based on detailed spectroscopy in the rest-frame ul- 
traviolet. In order for galaxies to have existed at the high collapse redshifts indicated 
by these ages, there must be a minimum level of power in the density fluctuation 
spectrum on galaxy scales. This paper compares the required power with that inferred 
from other high-redshift populations: damped Lyman-a absorbers and Lyman- limit 
galaxies at z ~ 3.2. If the collapse redshifts for the old red galaxies are in the range 
z c ~ 6 - 8, there is general agreement between the various tracers on the required in- 
homogeneity on 1-Mpc scales. This level of small-scale power requires the Lyman-limit 
galaxies to be approximately v ~ 3.0 fluctuations, implying a very large bias parame- 
ter b ~ 6. If the collapse redshifts of the red galaxies are indeed in the range z c = 6 — 8 
required for power spectrum consistency, their implied ages at z ~ 1.5 are between 3 
and 3.8 Gyr for essentially any model universe of current age 14 Gyr. The age of these 
objects as deduced from gravitational collapse thus provides independent support for 
the ages estimated from their stellar populations. Such early-forming galaxies are rare, 
and their contribution to the cosmological stellar density is consistent with an extrap- 
olation to higher redshifts of the star-formation rate measured at z < 5; there is no 
evidence for a general era of spheroid formation at extreme redshifts. 

Key words: galaxies: clustering - cosmology: theory - large-scale structure of Uni- 
verse. 



1 INTRODUCTION 

It is widely believed that the sequence of cosmological struc- 
ture formation was hierarchical, originating in a density 
power spectrum with increasing fluctuations on small scales. 
The large-wavelength portion of this spectrum is accessible 
to observation today through studies of galaxy clustering in 
the linear and quasilinear regimes. However, nonlinear evo- 
lution has effectively erased any information on the initial 
spectrum for wavelengths below about 1 Mpc. The most 
sensitive way of measuring the spectrum on smaller scales is 
via the abundances of high-redshift objects; the amplitude 
of fluctuations on scales of individual galaxies governs the 
redshift at which these objects first undergo gravitational 
collapse. 

The aim of this paper is to apply these arguments about 
the small-scale spectrum to a particularly interesting class of 
galaxy which we have recently discovered. It has long been 
apparent that a significant fraction of the optical identifica- 
tions of 1-mJy radio galaxies are red and inactive (Wind- 
horst, Kron & Koo 1984; Kron, Koo & Windhorst 1985). 



More recently, we have obtained the deep absorption-line 
spectroscopy needed in order to prove that these colours 
result from a well-evolved stellar population. The mini- 
mum age of the stars can be inferred robustly from spectral 
breaks, and gives ages of 3.5 Gyr for 53W091 at z = 1.55 
(Dunlop et al. 1996; Spinrad et al. 1997), and 4.0 Gyr for 
53W069 at z = 1.43 (Dunlop 1998; Dey et al. 1998). Such 
ages push the formation era for these galaxies back to ex- 
tremely high redshifts, and it is of interest to ask what level 
of small-scale power is needed in order to allow this early for- 
mation. However, the dating of stellar populations rests on 
complex modelling, and so it is desirable to have an indepen- 
dent way of checking whether these high collapse redshifts 
are correct. We have carried out such a test, using the fact 
that the abundances of early-forming galaxies are sensitive 
to the amplitude of the small-scale power spectrum. Requir- 
ing a level of small-scale power consistent with that implied 
by other high-redshift objects predicts a collapse redshift for 
our red galaxies. From this, we can predict an age - which 
can then be compared with the age results obtained by an- 
alyzing stellar populations. 
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We shall adopt a standard framework for interpret- 
ing the abundances of high-redshift objects in terms of 
structure-formation models, as outlined by Efstathiou & 
Rees (1988). Under the assumption that the growth of struc- 
ture proceeds as a gravitational hierarchy with Gaussian 
primordial statistics, the abundance of objects of a given 
mass is related directly to the rms density fluctuations on 
that mass scale. In Section 2, we summarize the necessary 
elements of this Press-Schechter theory. It will be impor- 
tant to achieve a consistent picture in this analysis between 
these observations of high-redshift density fluctuations and 
the fluctuation spectrum at the present deduced from galaxy 
clustering; Section 3 summarizes our knowledge of the large- 
scale spectrum. We then assemble the data on masses and 
abundance of high-redshift galaxies in Section 4, summariz- 
ing both our own results and those of other classes of high- 
redshift objects. The implied density fluctuation spectrum 
is then discussed in Section 5, where we note that these re- 
sults require a high level of bias for rare high-redshift galax- 
ies. Finally, we return in Section 6 to the question of stellar 
ages in our red radio galaxies, in the light of the collapse 
redshifts implied by the constraints on small-scale density 
fluctuations. 

2 PRESS-SCHECHTER APPARATUS 

The formalism of Press & Schechter (1974) gives a way of 
calculating the fraction F c of the mass in the universe which 
has collapsed into objects more massive than some limit M: 



F c (> M,z) 
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8c 



(1) 
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Here, a(M) is the rms fractional density contrast obtained 
by filtering the linear-theory density field on the required 
scale. In practice, this filtering is usually performed with a 
spherical 'top hat' filter of radius R, with a corresponding 
mass of 4ivpbR 3 /3, where pb is the background density. The 
number 8 C is the linear-theory critical overdensity, which for 
a 'top-hat' overdensity undergoing spherical collapse is 1.686 
- virtually independent of fi. This form describes numeri- 
cal simulations very well (see e.g. Ma & Bertschinger 1994). 
The main assumption is that the density field obeys Gaus- 
sian statistics, which is true in most inflationary models. 
Given some estimate of F c , the number a(R) can then be 
inferred. Note that for rare objects this is a pleasingly ro- 
bust process: a large error in F c will give only a small error 
in a(R), because the abundance is exponentially sensitive to 
a. 

Total masses are of course ill-defined both for real as- 
tronomical objects and clumps of particles in simulations; a 
better quantity to use is the velocity dispersion. Virial equi- 
librium for a halo of mass M and proper radius r demands 
a circular orbital velocity of 

V c 2 = ™ (2) 
r 

For a spherically collapsed object this velocity can be con- 
verted directly into a Lagrangian comoving radius which 
contains the mass of the object within the virialization ra- 
dius (e.g. White, Efstathiou & Frenk 1993) 



R/ ft" 1 Mpc : 



2 1/2 [Vyi00kms- 



n^ /2 (i + * c )V2/, 



1/6 • 



(3) 



(h = _ffo/100 kms -1 Mpc -1 ). Here, z c is the redshift of viri- 
alization; Q m is the present value of the matter density pa- 
rameter; f c is the density contrast at virialization of the 
newly-collapsed object relative to the background, which is 
adequately approximated by 



f c = 178/^ 6 ( 2c ), 



(4) 



with only a slight sensitivity to whether A is non-zero (Eke, 
Cole & Frenk 1996). 

For isothermal-sphere haloes, the velocity dispersion is 



a v 



V c /V2. 



(5) 



Given a formation redshift of interest , and a velocity disper- 
sion, there is then a direct route to the Lagrangian radius 
from which the proto-object collapsed. It is sometimes ar- 
gued that the observed stellar velocity dispersion should be 
a little 'cooler' than t hat o f the dark-matter halo hosting the 
galaxy (by a factor for a r -3 stellar density profile in 

an isothermal sphere, for example). However, this assumes 
that the dark matter totally dominates the gravity, whereas 
real galaxies are baryon dominated in the centre. Any veloc- 
ity correction is therefore likely to be small in practice, and 
we ignore the effect. 

The Press-Schechter collapsed fraction can now be con- 
verted to a differential number density of objects, n(M), 
using 



Mn(M)= Pb d J±. 



(6) 



In practice, however, one is more likely to measure an inte- 
grated number density N of objects which lie above above 
some mass threshold, in which case 



= N 

Pb 3 



(7) 



This just says that the collapsed fraction of the mass is the 
fraction of the volume contained in the Lagrangian spheres 
around each object. As argued above, even quite large un- 
certainties in F c can have little effect on the implied value 
of a(R). This allows us to neglect the uncertain constant of 
proportionality in the above relation, which is 



e + 2 MN 
e pt 



(8) 



for F c oc M _e ; similarly, the uncertainties in estimating the 
appropriate value of R from the observed circular velocities 
are often unimportant. Strictly, the observations are lower 
limits: we must make at least sufficient collapsed objects to 
site the galaxies under study, but some objects of this mass 
may give rise to galaxies of a different type. However, the re- 
sults are highly robust to substantial changes in the assumed 
abundance, so we shall treat them as measurements. 

The number densities require a cosmological model, and 
we quote figures assuming fi = 1. For specific calculations, 
we scale to other values of Q using 



n D 2 dr = m D\ dri, 



(9) 



where dr is increment of comoving distance and D is 
angular-diameter distance (D — RoSk(r)/[l + z]; see the 
Appendix). The scaling of F c with model is different, be- 
cause the inferred density of baryonic material depends on 
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the element of radial distance only: 

F c dr = F cl dn. (10) 

3 POWER SPECTRA FROM GALAXY 
CLUSTERING 

The small-scale <j(R) data have to be related to the present- 
day observations of large-scale fluctuations in order to make 
a consistent picture. The present linear fluctuation spectrum 
is known independent of uncertainties about bias for R > 
3/i _1 Mpc (Peacock 1997). We summarize here the results 
of these studies. 

We use a dimensionless notation for the power spec- 
trum: A 2 is the contribution to the fractional density vari- 
ance per unit lnfc. In the convention of Peebles (1980), this 
is 



da 2 



V 
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(V being a normalization volume), and the relation to the 
correlation function is 



. 2 / , n dk sin kr 



(12) 



Similarly, the variance in fractional density contrast aver- 
aged over spheres of radius R is 



a\R) 



A 2 (k) 



dk 
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where Wu = 3(siny — ycosy)/y' i ; y = kR. 

The twin problems with galaxy clustering are (i) the 
power-spectrum measurements are nonlinear, rather than 
the linear-theory power spectrum required by the Press- 
Schechter method; (ii) the normalization and even shape of 
the galaxy spectrum is biased relative to that of the mass. 
The first problem can be dealt with by calibrating the non- 
linear effects using iV-body simulations. The second is more 
difficult, but soluble in a number of limits. First, the over- 
all normalization can be determined by the Press-Schechter 
method of Section 2, as applied to rich clusters. This gives a 
measurement of the rms in spheres of radius 8 ftT 1 Mpc, on 
which there is general agreement: 

-0.56 



(j 8 = [0.5 - 0.6] n~ 



(14) 



(Henry & Arnaud 1989; White, Efstathiou & Frenk 1993; 
Viana & Liddle 1996; Eke, Cole & Frenk 1996). On smaller 
scales, bias is expected to steepen the galaxy correlations, 
but this effect operates on the nonlinear data, and so has 
a small effect on the inferred linear spectrum for R > 
3/i _1 Mpc (Peacock 1997). 

The resulting spectrum shape appears to be inconsis- 
tent with any variant of pure Cold Dark Matter, and is bet- 
ter described by Mixed Dark Matter with roughly a 30 per 
cent admixture of light neutrinos (e.g. Klypin et al. 1993; 
Peacock 1997; Smith et al. 1997). We are now interested in 
seeing how well this spectrum matches onto the smaller-scale 
data obtained from abundances of high-redshift galaxies. 

4 DATA ON HIGH-REDSHIFT GALAXY 
ABUNDANCES 

In addition to our red mjy galaxies, two classes of high- 
redshift object have been used recently to set constraints on 



the small-scale power spectrum at high redshift. 
4.1 Damped Lyman-a systems 

Damped Lyman-a absorbers are systems with HI column 
densities greater than ~ 2x 10 24 m~ 2 (Lanzetta et al. 1991). 
If the fraction of baryons in the virialized dark matter ha- 
los equals the global value fls , then data on these systems 
can be used to infer the total fraction of matter that has 
collapsed into bound structures at high redshifts (Ma & 
Bertschinger 1994, Mo & Miralda-Escude 1994; Kaufmann 
& Chariot 1994; Klypin et al. 1995). The highest measure- 
ment at (z) ~ 3.2 implies Q,m — 0.0025/i~ 1 (Lanzetta et al. 
1991; Storrie-Lombardi, McMahon & Irwin 1996). We take 
n B h 2 = 0.02 as a compromise between the lower Walker 
et al. (1991) nucleosynthesis estimate and the more recent 
estimate of 0.025 from Tytler et al. (1996), giving 
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for these systems. In this case alone, an explicit value of h 
is required in order to obtain the collapsed fraction; we take 
h = 0.65. 

The photoionizing background prevents virialized 
gaseous systems with circular velocities of less than about 
50 kms -1 from cooling efficiently, so that they cannot 
contract to the high density contrasts characteristic of 
galaxies (e.g. Efstathiou 1992). We follow Mo & Miralda- 
Escude (1994) and use the circular velocity range 50 - 
100 kms -1 (a v = 35 - 70 kms -1 ) to model the damped 
Lyman alpha systems. Reinforcing the photoionization 
argument, detailed hydrodynamic simulations imply that 
the absorbers are not expected to be associated with 
very massive dark- matter haloes (Haehnelt, Steinmetz & 
Rauch 1997). This assumption is consistent with the rather 
low luminosity galaxies detected in association with the 
absorbers in a number of cases (Le Brun et al. 1996). 

4.2 Lyman-limit galaxies 

Steidel et al. (1996) observed star-forming galaxies between 
z = 3 and 3.5 by looking for objects with a spectral break 
redwards of the U band. Our treatment of these Lyman-limit 
galaxies is similar to that of Mo & Fukugita (1996), who 
compared the abundances of these objects to predictions 
from various models. Steidel et al. give the comoving density 
of their galaxies as 

N(Q = 1) ~ 10~ 2 ' 54 ( h- 1 Mp C y 3 . (16) 

This is a high number density, comparable to that of L* 
galaxies in the present Universe. The mass of L* galaxies 
corresponds to collapse of a Lagrangian region of volume 
~ 1 Mpc 3 , so the collapsed fraction would be a few tenths 
of a per cent if the Lyman-limit galaxies had these masses. 

Direct dynamical determinations of these masses are 
still lacking in most cases. Steidel et al. attempt to infer a 
velocity width by looking at the equivalent width of the C 
and Si absorption lines. These are saturated lines, and so 
the equivalent width is sensitive to the velocity dispersion; 
values in the range 



a v ~ 180- 320 kms~ 



(17) 
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are implied. These numbers may measure velocities which 
are not due to bound material, in which case they would 
give an upper limit to V c j\f2 for the dark halo. A more 
recent measurement of the velocity width of the Ha emis- 
sion line in one of these objects gives a dispersion of closer 
to 100 kms -1 (Pettini, private communication), consistent 
with the median velocity width for Lya of 140 kms -1 mea- 
sured in similar galaxies in the HDF (Lowenthal et al. 1997). 
Of course, these figures could underestimate the total veloc- 
ity dispersion, since they are dominated by emission from 
the central regions only. For the present, we consider the 
range of values a v = 100 to 320 kms -1 , and the sensitivity 
to the assumed velocity will be indicated. In practice, this 
uncertainty in the velocity does not produce an important 
uncertainty in the conclusions. 

4.3 Red radio galaxies 

We have observed two galaxies at z = 1.43 and 1.55, over an 
area 1.68 x 10~ 3 sr, so a minimal comoving density is from 
one galaxy in this redshift range: 

N(fl = 1) > HT 5 ' 87 (ft^Mpc)" 3 . (18) 

This figure is comparable to the density of the richest Abell 
clusters, and is thus in reasonable agreement with the discov- 
ery that rich high- redshift clusters appear to contain radio- 
quiet examples of similarly red galaxies (Dickinson 1995). 

Since the velocity dispersions of these galaxies are not 
observed, they must be inferred indirectly. This is possible 
because of the known present-day Faber-Jackson relation for 
ellipticals. For 53W091, the large-aperture absolute magni- 
tude is 

M v (z = 1.55 | fi = 1) ~ -21.62 - 51og 10 h (19) 

(measured direct in the rest frame). From our Solar- 
metallicity models, this would be expected to fade by about 
0.9 mag. between z = 1.55 and the present, for an Q — 1 
model of present age 14 Gyr (note that Bender et al. 1996 
have observed a shift in the zero-point of the M — a v re- 
lation out to z = 0.37 of about the expected size). If we 
compare these numbers with the a v - Mv relation for Coma 
(m - M = 34.3 for h = 1) taken from Dressier (1984), this 
gives velocity dispersions in the range 

a v = 222 to 292 kms -1 . (20) 

This is a very reasonable range for a giant elliptical, and 
we adopt it hereafter. Assuming low-density models would 
increase these figures by an amount smaller than the above 
range, so we ignore this additional uncertainty. 

We note in passing that these figures also make a pre- 
diction for the metallicity: 

Mg 2 = 0.32 to 0.35, (21) 
(Dressier 1984) corresponding to 

[Fe/H] = 0.11 to 0.39, (22) 

or a metallicity of between 1.3 and 2.5 times Solar. Care 
is needed here, however, because this figure refers to the 
nuclear metallicity, whereas our spectra are effectively to- 
tal. Given the metallicity gradients in low-redshift ellipti- 
cals, such a slightly super-Solar nuclear metallicity would 
result in an integrated mean metallicity of Solar at best (e.g. 



Gonzalez & Gorgas 1996; Buzzoni 1996). This means that 
the use of Solar-metallicity models in estimating the age of 
the stellar populations in these galaxies is consistent. 

Having established an abundance and an equivalent cir- 
cular velocity for these galaxies, our treatment of them will 
differ in one critical way from the Lyman-a and Lyman-limit 
galaxies. For these, we take the normal Press- Schechter ap- 
proach, in which the systems under study are assumed to 
be newly born. For the Lyman-a and Lyman-limit galaxies, 
this may not be a bad approximation, since they are evolving 
rapidly and/or display high levels of star- formation activity. 
For the radio galaxies, conversely, this would be a very poor 
assumption, since the evidence is that they existed as dis- 
crete systems at redshifts much higher than the z ~ 1.5 
where we see them today. Our strategy will therefore be 
to apply the Press-Schechter machinery at some unknown 
formation redshift, and see what range of redshift gives a 
consistent degree of inhomogeneity. 



5 THE SMALL-SCALE FLUCTUATION 
SPECTRUM 

5.1 The empirical spectrum 

Fig. 1 shows the <j(R) data which result from the Press- 
Schechter analysis, for three cosmologies. The <j{R) numbers 
measured at various high redshifts have been translated to 
z = using the appropriate linear growth law for density 
perturbations (see Appendix). 

The open symbols give the results for the Lyman-limit 
(largest R) and Lyman-a (smallest R) systems. The approx- 
imately horizontal error bars show the effect of the quoted 
range of velocity dispersions for a fixed abundance; the ver- 
tical errors show the effect of changing the abundance by a 
factor 2 at fixed velocity dispersion. The locus implied by 
the red radio galaxies sits in between. The different points 
show the effects of varying collapse redshift: z c = 2, 4, . . . , 12 
[lowest redshift gives lowest a(R)]. Clearly, collapse redshifts 
of 6 - 8 are favoured for consistency with the other data on 
high-redshift galaxies, independent of theoretical preconcep- 
tions and independent of the age of these galaxies. This level 
of power (&[R] ~ 2 for R ~ 1 ft -1 Mpc) is also in very close 
agreement with the level of power required to produce the 
observed structure in the Lyman alpha forest (Croft et al. 
1997), so there is a good case to be made that the fluctua- 
tion spectrum has now been measured in a consistent fashion 
down to R ~ 0.5 h' 1 Mpc. 

The shaded region at larger R shows the results de- 
duced from clustering data (Peacock 1997). The ±l<r confi- 
dence region was obtained by an approximation to the frac- 
tional error in A 2 (fc) at k ~ 1/R. It is clear an O = 1 
universe requires the power spectrum at small scales to be 
higher than would be expected on the basis of an extrapola- 
tion from the large-scale spectrum. Depending on assump- 
tions about the scale-dependence of bias, such a 'feature' in 
the linear spectrum may also be required in order to sat- 
isfy the small-scale present-day nonlinear galaxy clustering 
(Peacock 1997). Conversely, for low-density models, the em- 
pirical small-scale spectrum appears to match reasonably 
smoothly onto the large-scale data. 
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Figure 1. The present-day linear fluctuation spectrum required 
in various cosmologies. This is expressed as a(R): the fractional 
rms fluctuation in density averaged in spheres of radius R. The 
data points are Lyman-a galaxies (open cross) and Lyman-limit 
galaxies (open circles) The diagonal band with solid points shows 
red radio galaxies with assumed collapse redshifts 2, 4, . . . 12. The 
vertical error bars show the effect of a change in abundance by 
a factor 2. The horizontal errors correspond to different choices 
for the circular velocities of the dark-matter haloes that host the 
galaxies (R scales linearly with velocity). The shaded region at 
large R gives the results inferred from galaxy clustering. The solid 
lines show T = 0.25 CDM predictions; for fi = 1 MDM models 
with h = 0.4 and Q„ = 0.2 and 0.3 (lowest at left) are also shown. 
The large-scale normalization is erg = 0.55 for Q = 1 or as = 1 
for the low-density models. 



5.2 Comparison with CDM & MDM 

Fig. 1 also compares the empirical data with various physical 
power spectra. A CDM model (using the transfer function of 
Bardeen et al. 1986) with shape parameter T = Qh = 0.25 is 
shown as a reference for all models. This has approximately 
the correct level of small-scale power, but significantly over- 
predicts intermediate-scale clustering, as discussed in Pea- 
cock (1997). The empirical shape is better described by 
MDM with tth ~ 0.4 and Q. v ~ 0.3. This is the lowest curve 
in Fig. lc, reproduced from the fitting formula of Pogosyan 
& Starobinsky (1995; see also Ma 1996). However, this curve 
fails to supply the required small-scale power, by about a 
factor 3 in a; lowering Q. v to 0.2 still leaves a very large dis- 
crepancy. This conclusion is in agreement with e.g. Mo & 
Miralda-Escude (1994), Ma & Bertschinger (1994), but con- 
flicts slightly with Klypin et al. (1995), who claimed that 
the f2„ = 0.2 model was acceptable. This difference arises 
partly because Klypin et al. adopt a lower value for 8 C (1.33 
as against 1.686 here), and also because they adopt the high 
normalization of as = 0.7; the net effect of these changes 
is to boost the model relative to the small-scale data by a 
factor of 1.6, which would allow marginal consistency for 
the Q. v = 0.2 model. MDM models do allow a higher nor- 
malization than the conventional figure of as = 0.55, partly 
because of the very flat small-scale spectrum, and also be- 
cause of the effects of random neutrino velocities. However, 
such shifts are at the 10 per cent level (Borgani et al. 1997a, 
1997b), and as = 0.7 would probably still give a cluster 
abundance in excess of observation. The consensus of more 
recent modelling is that even Q. v — 0.2 MDM is deficient in 
small-scale power (Ma et al. 1997; Gardner et al. 1997). 

All the models in Fig. 1 assume n = 1; in fact, consis- 
tency with the COBE results for this choice of as requires 
a significant tilt for flat models, n ~ 0.8 - 0.9. Over the 
range of scales probed by large-scale structure, changes in 
n are largely degenerate with changes in Qh, but the small- 
scale power is more sensitive to tilt than to £lh. Tilting the 
Q — 1 models is not attractive, since it increases the ten- 
dency for model predictions to lie below the data. However, 
a tilted low-fi flat CDM model would agree moderately well 
with the data on all scales, with the exception of the 'bump' 
around R ~ 30 h^ 1 Mpc. Testing the reality of this feature 
will therefore be an important task for future generations of 
redshift survey. 

5.3 Limits on high-redshift clustering 

An interesting aspect of these results is that the level of 
power on 1-Mpc scales is only moderate: a(l h~ Mpc) ~ 2. 
At z ~ 3, the corresponding figure would have been much 
lower, making systems like the Lyman-limit galaxies rather 
rare. For Gaussian fluctuations, as assumed in the Press- 
Schechter analysis, such systems will be expected to display 
spatial correlations which are strongly biased with respect to 
the underlying mass. The linear bias parameter depends on 
the rareness of the fluctuation and the rms of the underlying 
field as 



(Kaiser 1984; Cole & Kaiser 1989; Mo & White 1996), where 
v — 8 c /a, and a 2 is the fractional mass variance at the 
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Figure 2. The bias parameter at z = 3.2 predicted for the Ly- 
man-limit galaxies, as a function of their assumed circular veloc- 
ity. Dotted line shows f2 = 0.3 open; dashed line is fl = 0.3 flat; 
solid line is Q = 1. A substantial bias in the region of b ~ 6 is 
predicted rather robustly. 



redshift of interest. 

In this analysis, S c — 1.686 is assumed. Variations in 
this number of order 10 per cent have been suggested by au- 
thors who have studied the fit of the Press- Schechter model 
to numerical data. These changes would merely scale b— 1 by 
a small amount; the key parameter is v, which is set entirely 
by the collapsed fraction. For the Lyman-limit galaxies, typ- 
ical values of this parameter are v ~ 3, and it is clear that 
very substantial values of bias are expected, as illustrated in 
Figure 2. 

This diagram shows how the predicted bias parameter 
varies with the assumed circular velocity, for a number den- 
sity of galaxies fixed at the level observed by Steidel et al. 
(1996). The sensitivity to cosmological parameter is only 
moderate; at V c = 200 kms -1 , we have b ~ 4.6, 5.5, 5.8 
for the open, flat and critical models respectively. These 
numbers scale approximately as V c -0 ' 4 , and b is within 20 
per cent of 6 for most plausible parameter combinations. 
Strictly, the bias values determined here are upper limits, 
since the numbers of collapsed haloes of this circular ve- 
locity could in principle greatly exceed the numbers of ob- 
served Lyman-limit galaxies. However, the undercounting 
would have to be substantial: increasing the collapsed frac- 
tion by a factor 10 reduces the implied bias by a factor of 
about 2. A substantial bias seems difficult to avoid, as has 
been pointed out in the context of CDM models by Baugh, 
Cole & Frenk (1997). 

We now compare these calculations to the recent de- 
tection by Steidel et al. (1997) of strong clustering in the 
population of Lyman-limit galaxies at z ~ 3. The evidence 
takes the form of a redshift histogram binned at Az = 0.04 
resolution over a field 8.7' x 17.6' in extent. For Q — 1 and 
2 = 3, this probes the density field using a cell with dimen- 



Conveniently, this has a volume equivalent to a sphere of 
radius 7.5h~ 1 Mpc, so it is easy to measure the bias di- 
rectly by reference to the known value of ag. Since the de- 
gree of bias is large, redshift-space distortions from coherent 
infall are small; the cell is also large enough that the dis- 
tortions of small-scale random velocities at the few hundred 
kms -1 level are also small. Using the model of equation (11) 
of Peacock (1997) for the anisotropic redshift-space power 
spectrum and integrating over the exact anisotropic window 
function, we confirm that the above simple volume argument 
should be accurate to a few per cent for reasonable power 
spectra: 



CTcdi ~ b(z = 3) 07.5(2 = 3), 



(25) 



where we define the bias factor at this scale. The results of 
Mo & White (1996) suggest that the scale-dependence of 
30 J bias should be weak. 

In order to estimate cr C eii, we have made simulations of 
synthetic redshift histograms, using the method of Poisson- 
sampled lognormal realizations described by Broadhurst, 
Taylor & Peacock (1995). We use a x 2 statistic to quan- 
tify the nonuniformity of the redshift histogram, and find 
that <7ceii — 0.9 is required in order for the field of Steidel 
et al. (1997) to be typical. It is then straightforward to ob- 
tain the bias parameter since, for a present-day correlation 
function £(r) oc r~ 1,8 , 



0-7.5(2 = 3) = as x [S/7.5] 1 ' 8 ^ x 1/4 ~ 0.146, 
implying 

6(2 = 3 I fi = 1) ~ 0.9/0.146 ~ 6.2. 



(26) 



(27) 



Steidel et al. (1997) use a rather different analysis which 
concentrates on the highest peak alone, and obtain a min- 
imum bias of 6, with a preferred value of 8. They use the 
Eke et al. (1996) value of as = 0.52, which is on the low 
side of the published range of estimates. Using as = 0.55 
would lower their preferred b to 7.6, which is satisfyingly 
close to our estimate. Note that, with both these methods, 
it is much easier to rule out a low value of b than a high one; 
given a single field, it is possible that a relatively 'quiet' re- 
gion of space has been sampled, and that much larger spikes 
remain to be found elsewhere. Henceforth, we assume that 
the Steidel et al. (1997) field is typical, since there is evi- 
dence that other fields have a similar appearance (Steidel, 
private communication) . 

Having arrived at a figure for bias if Q = 1, it is easy to 
translate to other models, since cr ce ii is observed, indepen- 
dent of cosmology. For low fi models, the cell volume will in- 
crease by a factor [D 2 dr]/[D 2 dri]; comparing with present- 
day fluctuations on this larger scale will tend to increase 
the bias. However, for low Q., two other effects increase the 
predicted density fluctuation at 2 = 3: the cluster constraint 
increases the present-day fluctuation by a factor O~ ' 56 , and 
the growth between redshift 3 and the present will be less 
than a factor of 4. Using the Appendix to calculate these 
corrections, we get 



b(z = 3\Q = 0.3) 
b(z = 3 I fi = 1) 



42 (open) 
60 (flat) 



(28) 



cell = 15.4 x 7.6 x 15.0 [h' 1 Mpc] 3 



(24) 



which suggests an approximate scaling as b oc Q, - 72 (open) 
or Q? A2 (flat). Multiplying the fi = 1 figure of 6.2 by these 
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factors gives bias values of 2.6 (SI = 0.3 open) or 3.7 (SI = 0.3 
fiat). The significance of this observation is thus to provide 
the first convincing proof for the reality of galaxy bias: for 
SI ~ 0.3, bias is not required in the present universe, but 
we now see that b > 1 is needed at z = 3 for all reasonable 
values of SI. 

Comparing these bias values with Fig. 2, we see that 
the observed value of b is quite close to the prediction in the 
case of SI = 1 - suggesting that the simplest interpretation 
of these systems as collapsed rare peaks may well be roughly 
correct. Indeed, for high circular velocities there is a danger 
of exceeding the predictions, and it would create something 
of a difficulty for high-density models if a velocity as high 
as V c ~ 300 kms -1 were to be established as typical of the 
Lyman-limit galaxies. For low SI, the 'observed' bias is lower 
than the predictions, so there is no immediate conflict. For 
a circular velocity of 200 kms" 1 , we would need to say that 
the collapsed fraction was underestimated by roughly a fac- 
tor 10 to close the gap in the case of an open universe. This 
change in collapsed fraction increases the values of a in Fig. 
1 by a factor of about 1.5, increasing the 'observed' bias by 
the same factor. At the same time, this makes u smaller, 
reducing the predicted bias by about a factor 2 and produc- 
ing agreement on a bias factor of between 3 and 4. Such a 
change in F c could come about either by postulating that the 
conversion from velocity to R is systematically in error, or 
by suggesting that there may be many haloes which are not 
detected by the Lyman-limit search technique. It is hard to 
argue that either of these possibilities are completely ruled 
out. Nevertheless, we have reached the paradoxical conclu- 
sion that large-amplitude clustering in the early universe is 
more naturally understood in an SI = 1 model, whereas one 
might have expected the opposite conclusion. 



6 AGES AND COLLAPSE REDSHIFTS 

We now return to the red radio galaxies, and ask if the 
collapse redshifts inferred above are consistent with the age 
data on these objects. First bear in mind that in a hierarchy 
some of the stars in a galaxy will inevitably form before the 
epoch of collapse. Indeed, some direct observational evidence 
for the assembly of galaxies from sub-galactic clumps may 
now be starting to emerge (Pascarelle et al. 1996). At the 
time of final collapse, the typical stellar age will be some 
fraction a of the age of the universe at that time: 

age = t(z obs ) - t(z c ) + at(z c ). (29) 

We can rule out a = 1 (i.e. all stars forming in small sub- 
units just after the big bang). For present-day ellipticals, the 
tight colour-magnitude relation only allows an approximate 
doubling of the mass through mergers since the termination 
of star formation (Bower at al. 1992). This corresponds to 
a ~ 0.3 (Peacock 1991). A non-zero a just corresponds to 
scaling the collapse redshift as 

apparent (1 + z c ) oc (1 - a) _2/3 , (30) 

since t oc (1 + z) -3 ^ 2 at high redshifts for all cosmologies. 
For example, a galaxy which collapsed at z = 6 would have 
an apparent age corresponding to a collapse redshift of 7.9 
for a — 0.3. 



Converting the ages for the galaxies to an apparent col- 
lapse redshift depends on the cosmological model, but par- 
ticularly on Ho- We can circumvent some of this uncertainty 
by fixing the age of the universe. After all, it is of no interest 
to ask about formation redshifts in a model with e.g. SI = 1, 
h = 0.7 when the whole universe then has an age of only 9.5 
Gyr. If SI = 1 is to be tenable then either h < 0.5 against all 
the evidence or there must be an error in the stellar evolu- 
tion timescale. If the stellar timescales are wrong by a fixed 
factor, then these two possibilities are degenerate. It there- 
fore makes sense to measure galaxy ages only in units of 
the age of the universe - or, equivalently, to choose freely 
an apparent Hubble constant which gives the universe an 
age comparable to that inferred for globular clusters. In this 
spirit, Fig. 3 gives apparent ages as a function of effective 
collapse redshift for models in which the age of the universe 
is forced to be 14 Gyr (e.g. Jimenez et al. 1996). 

This plot shows that the ages of the red radio galax- 
ies are not permitted very much freedom. We have argued 
for a consistent formation redshift in the range 6 to 8 on 
abundance grounds, and this clearly predicts an age of close 
to 3.0 Gyr for SI = 1, or 3.7 Gyr for low-density models, 
irrespective of whether A is nonzero. The age-z c relation 
is rather flat, and this gives a robust estimate of age once 
we have some idea of z c through the abundance arguments. 
Conversely, it is almost impossible to determine the collapse 
redshift reliably from the spectral data, since a very high 
precision would be required both in the age of the galaxy 
and in the age of the universe. 

What conclusions can then be reached about allowed 
cosmological models? If we take an apparent z c = 8 from 
the power-spectrum arguments, then the apparent minimum 
age of > 4 Gyr for 53W069 can very nearly be satisfied in 
both low-density models (a current age of 14.5 Gyr would be 
required), but is unattainable for SI — 1. In the high-density 
case, a current age of 17.6 Gyr would be required to attain 
the required age for z c = 8; this requires a Hubble constant 
of h = 0.38. As argued above, this conclusion is highly in- 
sensitive to the assumed value of z c . If the true value of h 
does turn out to be close to 0.5, then it might be argued 
that SI — 1 is consistent with the data, given realistic uncer- 
tainties. The ages for the low-density models would in this 
case be large by comparison with the observed radio-galaxy 
ages. However, the ages obtained by modelling spectra with 
a single burst can only be lower limits to the true age for 
the bulk of the stars; we could easily be observing an even 
older burst which is made bluer by a little recent star for- 
mation. A low h measurement would therefore not rule out 
low-density models. 

The main conclusion of this paper is thus that the ex- 
istence of old radio galaxies at z = 1.5 poses two serious 
difficulties for an SI = 1 Universe: (i) a consistent picture of 
structure formation through gravitational instability from 
Gaussian initial conditions requires a high formation red- 
shift for these objects, leading to an old Universe and, par- 
ticularly, a very small Hubble constant if the stellar ages of 
these objects are accepted; (ii) the shape of the power spec- 
trum is complicated, with a large change in power between 
smoothing scales of 0.5 ft -1 Mpc and 5/i _1 Mpc; no known 
model predicts a spectrum with this shape. The second diffi- 
culty might be avoided through non-Gaussian statistics, but 
the first would require our age estimates for the radio galax- 
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Figure 3. The age of a galaxy at z = 1.5, as a function of its 
collapse redshift (assuming an instantaneous burst of star forma- 
tion). The various lines show f2 = 1 [solid]; open f! = 0.3 [dotted]; 
fiat Q = 0.3 [dashed]. In all cases, the present age of the universe 
is forced to be 14 Gyr. 



ies to be too high by a factor of about 1.5, which we con- 
sider implausible. The simple solution to these problems is 
of course to lower the density parameter, and either open or 
flat models with Q ~ 0.3 work quite well. The only counter- 
argument is that the empirical fluctuation spectrum then 
predicts a higher degree of bias for Lyman-limit galaxies at 
z ~ 3 than is observed, whereas prediction and observation 
match well for Q — 1. However, this problem disappears if 
the cr(R) data are increased by a factor < 1.5; the ques- 
tion of bias therefore does not significantly affect our claim 
that the empirical small-scale fluctuation spectrum is now 
measured, once the geometry of the universe is given. 

Lastly, it is interesting to note that it has been possible 
to construct a consistent picture which incorporates both 
the large numbers of star-forming galaxies at z < 3 and the 
existence of old systems which must have formed at very 
much larger redshifts. A recent conclusion from the num- 
bers of Lyman-limit galaxies and the star-formation rates 
seen at z ~ 1 has been that the global history of star forma- 
tion peaked at z ~ 2 (Madau et al. 1996). This leaves open 
two possibilities for the very old systems: either they are the 
rare precursors of this process, and form unusually early, or 
they are a relic of a second peak in activity at higher redshift, 
such as is commonly invoked for the origin of all spheroidal 



comppraiiLs. While wy LaiiuuL rule uui such a bimudal his- 
tory o f atar formation, the rarcncoo of the red radio galaxies 

indicates that there is no difficulty with the former picture. 
We can demonstrate this quantitatively by integrating the 
total amount of star formation at high redshift. According 
to Madau et al., The star- formation rate at z — 4 is 



10 7 ' 3 /i Mr. 



Gyr 1 Mpc" 



(31) 



declining roughly as (1 + z)~ 4 . This is probably a underesti- 
mate by a factor of at least 3, as indicated by suggestions of 
dust in the Lyman-limit galaxies (Pettini et al. 1997), and 



by the prediction of Pei & Fall (1995), based on high-z ele- 
ment abundances. If we scale by a factor 3, and integrate to 
find the total density in stars produced at z > 6, this yields 



p*(z } > 6) ~ 10 6 - 2 M o Mpc" 



(32) 



Since the mjy galaxies have a density of 10~ 5S7 /i 3 Mpc -3 
and stellar masses of order 10 M@ , there is clearly no con- 
flict with the idea that these galaxies are the first stellar 
systems of L* size which form en route to the general era of 
star and galaxy formation. 
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fi m [a] 



O v [al = 



a, 



a + fi m (l — a) + Sl v (a 3 — a) ' 
a 3 f2 v 

a + Q m (l — a) + fi v (a 3 — a) 



(34) 



(35) 



The age of the universe (at a a given epoch, taking 
the appropriate redshift-dependent H & Q) can be approx- 
imated to a few per cent by 



m=±\i-f\- 1 " , s? 



IW 



(36) 



where / = 0.7f2 m — 0.3f2 v + 0.3 and Sh is sinh if / < 1, 
otherwise sin. 



Ro dr 



The increment of comoving distance is 
[c/Ho] dz 



^/q, v + n m (i + z) 3 + (i - + z y 

where Q — Q m + Q v . This integrates to 



(37) 



RoS k (r) 



H ' 



1 - 0| 



-1/2, 



ll-fil 1 / 2 dz' 



o V(! - + 2 ') 2 + + n m (l + z'Y 



(38) 



For the linear growth of density perturbations, there is 
a density-dependent suppression of the Q — 1 linear growth 
law: 

cr(a) oc ag[£lm(a),£l v (a)], (39) 
where a high-accuracy fitting formula is 

= [n 4 J 7 - fiv + (i + n m /2)(i + n v /70)] _1 . (40) 

The required growth factor is then 
<t(z) _^ g[Q, m {a),Sl v (a)] 



ct(0) g[n m (0),Qv(0)]' 



(41) 
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APPENDIX: FORMULAE FOR GENERAL 
COSMOLOGIES 

If a nonzero cosmological constant is allowed, not all of the 
important cosmological formulae exist as analytical expres- 
sions, but in many cases accurate approximation formulae 
may be used: see Carroll, Press & Turner (1992). For con- 
venience, we summarize the necessary expressions here. 

In general, it is necessary to distinguish matter (m) and 
vacuum (v) contributions to the total density parameter. 
Both these parameters and the Hubble parameter vary with 
scale factor a = 1/(1 + z): 



H[a] = -Ho-vAMl - a- 2 ) + O m (a" 3 - a~ 2 ) + a" 2 . (33) 



